Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comments into the AST #14352

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

doorgan
Copy link
Contributor

@doorgan doorgan commented Mar 21, 2025

This implements AST comments and updates the formatter to look for comments in the AST instead of a separate comments list.

This adds a new :include_comments to Code.string_to_quoted/2 and Code.string_to_quoted!/2.

Comments are attached to AST nodes in three variants:

  • :leading_comments: Comments that are located before a node, or in the same line.

    Examples:

# This is a leading comment
foo # This one too
  • :trailing_comments: Comments that are located after a node, and before the end of the parent enclosing the node(or the root document).

    Examples:

foo
# This is a trailing comment
# This one too
  • :inner_comments: Comments that are located inside an empty node.

    Examples:

foo do
  # This is an inner comment
end

[
  # This is an inner comment
]

%{
  # This is an inner comment
}

A comment may be considered inner or trailing depending on wether the enclosing node is empty or not. For example, in the following code:

foo do
  # This is an inner comment
end

The comment is considered inner because the do block is empty. However, in the following code:

foo do
  bar
  # This is a trailing comment
end

The comment is considered trailing to bar because the do block is not empty.

In the case no nodes are present in the AST but there are comments, they are inserted into the root :__block__ node as :inner_comments.

Many of the ideas here come from Sourceror, which takes inspiration for the recast parser. In recast, comments are attached to nodes with the leading and trailing boolean flags. If a comment has both attributes set to false, it represents an "inner" comment. For the sake of clarity I decided to define it explicitly for the Elixir AST.
An additional difference is that recast stores all the comments(leading, trailing, inner) mixed together under a single comments field. I find that separating them to be cleaner and easier to work with, since you don't have to split or filter the list to get a certain kind of comment from a node.

Different from Sourceror which supports only :leading_comments and :trailing_comments, it introduces :inner_comments to remove the ambiguity between comments before the end keyword, and comments after an expression. This also becomes important when working on the formatter code, as it is very awkward to inject inner comments after the call args were already converted to algebra docs. The Sourceror implementation relies on the formatter behavior to produce ast and comments that the formatter would interpret correctly, meaning it can get away with a sub-optimal comment merging strategy; it only needs the formatter to work. This PR on the contrary tries to attach comments where they make the most sense, and updates the formatter accordingly instead of wrangling with comment line numbers.

The current implementation in this PR adds an extra traversal step after the code is initially parsed. Due to this, and due to the logic to merge comments and AST requiring precise line information in every AST nodes, it requires to also set up the following options:

[
  literal_encoder: &{:ok, {:__block__, &2, [&1]}},
  token_metadata: true
]

TODO

  • Add public documentation
  • Handle formatting of binary operator forms and binary operator chains
  • Fix cases from the formatter integration tests
  • Run the formatter on the Elixir codebase itself to ensure nothing changes

@doorgan doorgan force-pushed the doorgan/ast_comments branch from 3eb21c3 to 5dcb6e7 Compare March 27, 2025 23:13
include_comments = Keyword.get(opts, :include_comments, false)

Process.put(:code_formatter_comments, [])
opts = [preserve_comments: &preserve_comments/5] ++ opts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I told you privately that we could make preserve_comments: :ast an option but I now realized that preserve_comments is actually a private option, so the :include_comments option is better.

@doorgan doorgan force-pushed the doorgan/ast_comments branch 2 times, most recently from dc05897 to 518a43e Compare March 28, 2025 21:30
@doorgan doorgan force-pushed the doorgan/ast_comments branch from 518a43e to a1580da Compare March 31, 2025 14:53
@doorgan doorgan force-pushed the doorgan/ast_comments branch from a1580da to b12a9a3 Compare March 31, 2025 15:56
@doorgan
Copy link
Contributor Author

doorgan commented Apr 1, 2025

There's a few issues I still need to tackle and a few more cleanups are pending(mostly around the code for formatting interpolations and the repetition of comment docs generation in general), but I think this is generally ready for an initial round of reviews

There are 3 tests currently failing that are related to binary operators:

  1) test case with when and clause comment (Code.Formatter.IntegrationTest)
     test/elixir/code_formatter/integration_test.exs:298
     Assertion with == failed
     code:  assert IO.iodata_to_binary(Code.format_string!(good, opts)) == String.trim(good)
     left:  "case decomposition do\n  <<h, _::binary>> when h != ?< ->\n    decomposition =\n      decomposition\n      |> :binary.split(\" \", [:global])\n      |> Enum.map(&String.to_integer(&1, 16))\n\n    Map.put(dacc, String.to_integer(codepoint, 16), decomposition)\n\n  _ ->\n    dacc\nend"
     right: "case decomposition do\n  # Decomposition\n  <<h, _::binary>> when h != ?< ->\n    decomposition =\n      decomposition\n      |> :binary.split(\" \", [:global])\n      |> Enum.map(&String.to_integer(&1, 16))\n\n    Map.put(dacc, String.to_integer(codepoint, 16), decomposition)\n\n  _ ->\n    dacc\nend"
     stacktrace:
       test/elixir/code_formatter/integration_test.exs:299: (test)

  2) test comment inside operator with when (Code.Formatter.IntegrationTest)
     test/elixir/code_formatter/integration_test.exs:590
     Assertion with == failed
     code:  assert IO.iodata_to_binary(Code.format_string!(bad, opts)) == result
     left:  "raise # Comment\n      function(x) ::\n        any"
     right: "# Comment\nraise function(x) ::\n        any"
     stacktrace:
       test/elixir/code_formatter/integration_test.exs:597: (test)

  3) test first argument in a call without parens with comments (Code.Formatter.IntegrationTest)
     test/elixir/code_formatter/integration_test.exs:468
     Assertion with == failed
     code:  assert IO.iodata_to_binary(Code.format_string!(good, opts)) == String.trim(good)
     left:  "with bar ::\n       :ok\n       | :invalid\n       | :other"
     right: "with bar ::\n       :ok\n       | :invalid\n       # | :unknown\n       | :other"
     stacktrace:
       test/elixir/code_formatter/integration_test.exs:469: (test)

Once those are fixed, running make format should produce the same results as the formatter would before this change. I found a few scenarios where newlines may be added/removed for some trailing comments but that issue should be relatively easy to fix once I figure it out.

@doorgan doorgan marked this pull request as ready for review April 1, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants